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The power of multiple testing procedures can be increased by using weighted p-values 
(Genovese, Roeder and Wasserman 2005). We derive the optimal weights and we 
show that the power is remarkably robust to mis specification of these weights. We 
consider two methods for choosing weights in practice. The first, external weighting, is 
based on prior information. The second, estimated weighting, uses the data to choose 
weights. 
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1 Introduction 



H 

Xj^ ■ The power of multiple testing procedures can be increased by using weighted p-values (Genovese, 

^ ■ Roeder and Wasserman 2005). Dividing each p-value P by a weight w increases the probability of 

rejecting some hypotheses. Provided the weights have mean one, familywise error control methods 
and false discovery control methods maintain their frequentist error control guarantees. 

The first such weighting scheme appears to be Holm (1979). Related ideas are in Benjamini and 
Hochberg (1997) and Chen et al (2000). There are, of course, other ways to improve power aside 



from weighting. Some notable recent approaches include Rubin, van der Laan and Dudoit (2005), 

Storey (2005), Donoho and Jin (2004) and Signoravitch (2006). Of these, our approach is closest 

O . 

\Q . to Rubin, van der Laan and Dudoit (2005), hereafter, RVD. In fact, the optimal weights derived 



here, if re-expressed as cutoffs for test statistics, turn out to be identical to the cutoffs derived 

in RVD. Our main contributions beyond RVD are (i) a careful study of potential power losses 

due to departures from the optimal weights, (ii) robustness properties of weighted methods, and 

(iii) recovering power after using data splitting to estimate the weights. An important distinction 
• >-^ i 
^ between this paper and RVD versus Storey (2005) is that Storey uses a slightly different loss 

function and he requires a common cutoff for all test statistics. This allows him to make an elegant 

connection with the Neyman-Pearson lemma. In particular, his method automatically adapts from 

one-sided testing to two-sided testing depending on the configuration of means. Signoravitch 

(2006) uses invariance arguments to find powerful test statistics for multiple testing when the 

underlying tests are multivariate. 

In this paper we show that the optimal weights form a one parameter family. We also show 

the power is very robust to misspecification of the weights. In particular, we show that (i) sparse 

'Research supported by National Institute of Mental Health grants MH057881,MH066278, MH06329 and NSF 
Grant AST 0434343. The authors thank Jamie Robins for helping us to clarify several issues. 
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Figure 1: Power gain/loss for weighting a single hypothesis. In this example, an unweighted 
hypothesis has power 1/2. The weights are w < 1 < W\ with Wi/w = B. The top line shows 
the power when the alternative is given the correct weight W\. The bottom line, which is nearly 
indistinguishable from 1/2, shows the power when the alternative is given the incorrect weight w . 
As B increases, the power gain increases sharply while the power loss remains nearly constant. 



weights (a few large weights and minimum weight close to 1) lead to huge power gain for well 
specified weights, but minute power loss for poorly specified weights; and (ii) in the non-sparse 
case, under weak conditions, the worst case power loss for poorly specified weights is typically 
better than the power using equal weights. In fact, the power is degraded at most by a factor 
of about 7/(1 — a) where a is the fraction of nonnulls and 7 is the fraction of nulls that are 
mistaken for alternatives. Figure [l] shows the sparse case. The top line shows power from correct 
weighting while the bottom line shows power from incorrect weighting. We see that the power 
gains overwhelm the potential power loss. Figure^] shows the non-sparse case. The plots on the 
left show the power as a function of the alternative mean £. The dark solid line shows the lowest 
possible power assuming the weights were estimated as poorly as possible. The lighter solid line 
is the power of the unweighted (Bonferroni) method. The dotted line shows the power under 
theoretically optimal weights. The worst case weighted power is typically close to or larger then 
the Bonferroni power except for large £ when they are both large. 

We consider two methods for choosing the weights: (i) external weights, where prior informa- 
tion (based on scientific knowledge or prior data) singles out specific hypotheses and (ii) estimated 
weights where the data are used to construct weights. External weights are prone to bias while 
estimated weights are prone to variability. The two robustness properties reduce concerns about 
bias and variance. 

An example of external weighting is the following. We have test statistics {1} : j = 1, . . . , m} 
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Figure 2: Power as a function of the alternative mean £. In these plots, a = .01, m = 1000 
and a = 0.05. There are (1 — a)m nulls and ma alternatives with mean £. The left plots shows 
what happens when the weights are incorrectly computed assuming that a fraction 7 of nulls are 
actually alternatives with mean u. In the top plot, we restrict < u < £. In the second and third 
plot, no restriction is placed on u. The top and middle plot have 7 = .1 while the third plot has 
7 = 1 — a (all nulls misspecified as alternatives). The dark solid line shows the lowest possible 
power assuming the weights were estimated as poorly as possible. The lighter solid line is the 
power of the unweighted (Bonferroni) method. The dotted line is the power under the optimal 
weights. The vertical line in the top plot is at £*. The weighted method beats unweighted for al 
£ < £*. The right plot shows the least favorable u as a function of £. That is, mistaking 7m nulls 
for alternatives with mean u leads to the worst power. Also shown is the line u = £. 



associated with spatial locations {sj : j — 1, . . . , to} where s^ G [0, L], say. These could be 
association tests for markers on a genome. The number of tests to is large, on the order of 100,000 
for example. Each Tj is used to test the null hypothesis that 9j = E(Tj) = 0. Prior data is 
in the form of a smooth stochastic process {Z(s) : s E [0,L]}. This might be from a whole 
genome linkage scan. At alternatives, the mean /j,(s) = E(Z(s)) is a large positive value; however, 
due to correlation, at nulls close to alternatives, /j,(s) is also non-zero. Peaks in the process Z(s) 
provide approximate information about the location of alternatives. We want to use the process Z 
to generate reasonable weights for the test statistics. 

When external weights are not available, the optimal weights can be estimated from the data. 
One approach is to use data splitting (RVD) using a fraction of the data to estimate the weights and 
the remainder to test. For example, consider the two-stage genome-wide association study (e.g., 
Thomas et al. 2005) for which a sample of n subjects is split into two subsets. Using the first subset, 
we obtain test statistics {Tj : j — 1, . . . , to} associated with locations {sj : j = 1, . . . , to}. 
Typically only the second subset of data are used in the final analysis. Building on the ideas of 
Skol et al. (2006), we take the two-stage study design further, exploring how the first set of data 
can be utilized to formulate weights, and the full data set can be used for testing. 

2 Weighted Multiple Testing 

We are given hypotheses H = (H±, . . . , H m ) and standardized test statistics T = (T 1? . . . , T m ) 
where Tj ~ N(£j, 1). (The methods can be extended for nonnormal test statistics but we do not 
consider that case here.) For a two-sided hypothesis, Hj — 1 if £,■ ^ and Hj = otherwise. 
For the sake of parsimony, unless otherwise noted, results will be stated for a one-sided test where 
Hj — 1 if £j > although the results extend easily to the two-sided case. Let 9 = (£ 1; . . . , £ m ) 
denote the vector of means. 

The original data are often of the form 



X 



/ Xn X\2 . . . Xi m 

X 2 \ X 2 2 ■ ■ ■ x 2m 



\ X n \ X n 2 . . . X nm J 



(1) 



where the j th test statistic Tj is based on the j th column of X. Usually, Tj is of the form Tj = 
^Jn~jXjl<jj where Xj is approximately (or exactly) N^j, <7j2/nj) and the noncentrality parameter 

is £j = y/n^j/cTj. 



The p-values associated with the tests are P = (P 1; . . . , P m ) where Pj = &(Tj), $ = 1 — $ 
and $ denotes the standard Normal CDF. Let 



denote the sorted p-values and let 



P(l) < • • • < P(m) 



T(i) > • • • > T( m ) 



denote the sorted test statistics. 

A rejection set 1Z is a subset of {1, ... , m}. Say that 7?. controls familywise error at level a if 
P(7£ fl 7Y ) < a where 7i = {j : Hj = 0}. The Bonferroni rejection set is 

TZ = {j : Pj < a/m} = {j : 7} > z a/m } (2) 

where we use the notation zp — $ (/?). 

The weighted Bonferroni procedure of Genovese, Roeder and Wasserman (2005) is as follows. 
Specify nonnegative weights w = (w±, . . . , w m ) and reject hypothesis Hj if 

jen=\j: ^<-\. (3) 

[ Wj m ) 

As long as vrT 1 Ylj w j = 1> me rejection set 7£ controls familywise error at level a. For complete- 
ness, we provide the proof. (All further proofs are in the appendix.) 

Lemma 2.1 IfmT 1 V Wj = 1, ?/zen the rejection set TZ controls familywise error at level a. 

Proof. The familywise error is 

F((K nH )>0) = P (Pj < ^ for some j e Ho) 

P [Pj < — 3 - I = — > Wj < aw = a. M 

jen {1 jeHo 

Genovese, Roeder and Wasserman (2005) also showed that false discovery methods benefit by 
weighting. Recall that the false discovery proportion (FDP) is 

number of false rejections \TZr\T-io\ 
number of rejections \TZ\ 

where the ratio is defined to be if the denominator is 0. The false discovery rate (FDR) is 
FDR = E(FDP). Benjamini and Hochberg (1995) proved FDR < a if K = { j : P (j) < T} 
where T = max{j : Pu\ < ja/m}. Genovese, Roeder and Wasserman (2004) showed that 
FDR < a if the P'jS are replaced by Qj = Pj/wj as long as mT 1 V Wj = 1 as before. This paper 
will focus only on familywise error. Similar results hold for FDR and will be in a followup paper. 



3 Power and Optimality 

3.1 Power 

Before weighting, that is using weight 1, the power of a single, one-sided alternative is 

tt(&, 1) = ¥(Tj > z a/m ) = ®(z a/rn - £j). (5) 

The power 2 in the weighted case is 

*&,«,,-) = P (P, < ^f) = P (T, > IT 1 (^)) = $ (iT 1 (^ i/m ) - fc) . (6) 

Weighting increases the power when tUj > 1 and decreases the power when Wj < 1. 
Given = (£ 1; . . . , £ m ) and tu = (iy 1; . . . , iu m ) we define the average power 

-J>(^)/&>0). (7) 

More generally, if £ is drawn from a distribution Q and w = w(£) is a weight function we define 
the average power 

J*(ZMZ))HZ>o)dQ(Z)- (8) 

If we take Q to be the empirical distribution of (£i, . . . , £ m ) then this reduces to the previous 
expression. In this case we require iu(£) > and f w(£)dQ(£) = 1. 

3.2 Optimality and Robustness 

In the following theorem we see that the set of optimal weight functions form a one parameter 
family indexed by a constant c. 

Theorem 3.1 Given = (£i, . . . , £ m ), ?/ze optimal weight vector w = (wi, . . . , tu m ) ?/?a? maxi- 
mizes ?/ze average power subject to Wj > and m _1 X^jLi w j = ^ is w = (p c (£i), ■ ■ ■ , Pc(£m)) 

where 

/m\ — /£ c s 



,«(£) = (_J*^ + -)/(£>()). (9) 



2 For a two-sided alternative the power is 

*-^*re-«.)^( r, e) + i 
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Figure 3: Optimal weight function p c (0 f° r various c. In each case m = 1000 and a = 0.05. The 
functions are normalized to have maximum 1 . 



and c = c(8) is defined by the condition 



1 m 



(10) 



3=1 



The proof is in the appendix. Some plots of the function p c (£) for various values of c are 
shown in Figure |3j In these plots, the function is normalized to have maximum 1 for easier visual- 
ization. The result generalizes to the case where the alternative means are random variables with 
distribution Q in which case c is defined by 



J PdOdQiO = i. 



(ii) 



Remark. Rejecting when Pj/wj < a/m is the same as rejection when Zj > £j/2 + c/£,. This 
is identical to the result of Rubin, van der Laan and Dudoit (2005), obtained independently. The 
remainder of the paper, which shows some good properties of the weighted method, can thus also 
be considered as providing support for their method. In particular, they noted in their simulations 
then even poorly specified estimates of the cutoffs ^/2 + c/£, can still perform well. This paper 
provides insight into why that is true. 

From © and © we have immediately: 



Lemma 3.2 The power at an alternative with mean £ under optimal weights is $ (c/£ — £/2). 77ze 
average power under optimal weights, which we call the oracle power, is 

^£*(f-f)'te>°>. (i» 

The oracle power is not attainable since the optimal weights depend on 9 = (£i, . . . , £ m ) or, 
equivalently, on Q. In practice, the weights will either be chosen by prior information or by estimat- 
ing the £'s. This raises the following question: how sensitive is the power to correct specification 
of the weights? Now we show that the power is very robust to weight misspecification. 

The weights themselves can be very sensitive to changes in 9. Consider the following example. 
Suppose that 9 = (£i, . . . , £ m ) where each £ is equal to either or some fixed number £. The 
empirical distribution of the £/s is thus Q — (1 — a)5o + aS^ where 5 denotes a point mass and a 
is the fraction of nonzero means. The optimal weights are for £j = and l/o for £j = £. Let 
Q = (1 — a — 7)^0 + 7^ u + a^ where u is a small positive number. Since we have only moved 
the mass at to u, and u is small, we would hope that u»(£) will not change much. But this is not 
the case. Set 



£ = A + VA 2 - 2c, u = B- VB 2 - 2c (13) 

where 

A-W , . ° .. ), B =g-( . ,^° ., ). (14) 

\(m(7A+a))y \(m(7A+o))/ 

yields weights w and Wi on u and £ such that wq/wi = K. For example, take m = 1000, 
a = 0.05, a = .1, 7 = .1, if = 1000, and c = .1. Then u = .03 and £ = 9.8. The optimal weight 
on £ under Q is 10 but under Q it is .00999 and so is reduced by a factor of 1001. More generally 
we have the following result which shows that the weights are, in a certain sense, a discontinuous 
function of 9. 

Lemma 3.3 Fix a and m. For any 5 > and e > there exists Q = (1 — a)5 + aS^ and 
Q = (1 — a — j)5q + 7<5 U + a5^ such that 

d(Q,Q) <5, and ^|| <e (15) 

where a = a/4, d(Q, Q) = sup^ \Q{— 00, £], Q(— 00, £]| w ?/ze Kolmogorov-Smnirnov distance, p 
is the optimal weight function for Q and pis the optimal weight function for Q. 

Fortunately, this problem is not serious since it is possible to have high power even with poor 
weights. In fact, the power of the weighted method has the following two robustness properties: 

8 



Property I: Sparse weights (minimum weight close to 1) are highly robust. If most weights are 
less than 1 and the minimum weight is close to 1 then correct specification (large weights on 
alternatives) leads to large power gains but incorrect specification (large weights on nulls) leads to 
little power loss. 

Property II: Worst case analysis. Weighted hypothesis testing, even with poorly chosen weights, 
typically does as well or better than Bonferroni except when the the alternative means are large, in 
which both have high power. 

Let us now make the these statements precise. Also, see Genovese, Roeder and Wasserman 
(2006) and Roeder, Bacanu, Wasserman and Devlin (2006) for other results on the effect of weight 
misspecification. 

Property I. Consider first the case where the weights take two distinct values and the alternatives 
have a common mean £. Let e denote the fraction of hypotheses given the larger of the two values 
of the weights B. Then, the weight vector w is proportional to 

k terms m—k terms 

where k = em and B > 1 and hence the normalized weights are 

W = ( wi,...,Wi , Wq,...,Wq) 
k terms m—k terms 

where 

B 1 



w l = ,-, , /-, 7^ ^0 



eB + (l-e)' eB + (l-e) 

We say that the weights are sparse if e is small, that is, if most weights are near 1. 

Consider an alternative with mean £. The power gain by correct weighting is the power under 
weight tui minus the unweighted power 7r(£, W\) — 7r(£, 1). Similarly, the power loss for incorrect 
weighting is 7r(£, 1) — 7r(£, w ). The gain minus the loss, which we call the robustness function, is 

R(B,e) = (7r(£, Wl )-7r(£,l)) + (7r(£,l)-7r(^o)) (16) 

= $ (Zawi/m ~ £) + ^ (Zctwo/m ~ £) ~ 2$ (z a/m - f) . (17) 

The gain outweighs the loss if and only if R(B, e) > 0. This is illustrated in figures [T] and |4j 
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Figure 4: Robustness function for m = 1000. In this example, £ = z a / m which has power 1/2 
without weighting. The gain of correct weighting far outweighs the loss for incorrect weighting as 
long as the fraction of large weights e is small. 

Theorem 3.4 Fix B > 1. Then, \im e ^Q R(B, e) > 0. Moreover, there exists e*{B) > such that 
R(B,e) >0foralle< e*(B). 

We can generalize this beyond the two-valued case as follows. Let w be any weight vector such 
that mr x V . Wj = 1. Now define the (worst case) robustness function 

R(0 = f . min WfowA-nfol)}- max Itt&I) -*(£,«;,■)}. (18) 

{j: wj>1,Hj=1} {j: wj<1,Hj=1} 

We will see that i?(£) > under weak conditions and that the maximal robustness is obtained for 
£ near the Bonferroni cutoff z a / m . 

Theorem 3.5 A necessary and sufficient condition for -R(£) > is 

R b , B (C) = ^(z aB / m -n +^(z ab/m -n -2$(z a/m -n <o (i9) 

where B = min{wj : Wj > 1}, b — min{wj}. Moreover, 

R b , B (Z) = -A(£) + 0(1 - b) 
where 



A(0 



" ( ^a/m S 



mz. 



'aB/m 



£ >o 



(20) 
(21) 



and, as b — ► 1, /x({£ : i?(£) < 0}) — > and inf ? i?(£) — ► 0. 
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The theorem is illustrated in Figure|5] We see that there is overwhelming robustness as long as 
the minimum weight is near 1. Even in the extreme case b = 0, there is still a safe zone, an interval 
of values of £ over which i?(£) > 0. 

Lemma 3.6 Suppose that B > 2. Then there exists £* > such that Rs,b{0 > Ofar all < £ < 
£* and all b. An upper bound on £* is z a / m — l/(z a / m — z Ba / m ). 



Property II. Even if the weights are not sparse, the power of the weighted test cannot be too bad 
as we now show. To begin, assume that each mean is either equal to or £ for some fixed £ > 0. 
Thus, the empirical distribution is 

Q = (1 - a)S + aSi (22) 

where 5 denotes a point mass and a is the fraction of nonzero £/s. The optimal weights are 1/a for 
hypotheses whose mean is £. To study the effect of mis specification error, consider the case where 
b = 7m nulls are mistaken for alternatives with mean u > 0. This corresponds to misspecifying Q 
to be 

Q = (1 - a - 7)^0 + l&u + aS^. (23) 

We will study the effect of varying u so let ir(u) denote the power at the true alternative £ as 
a function of u. Also, let 7r Bo nf denote the power using equal weights (Bonferroni). Note that 
changing Q = (1 — a)5o + a5^ to Q = (1 — a)5o + a<% for £' 7^ £ does not change the weights. 

As the weights are a function of c, we first need to find c as a function of u. The normalization 
condition (fTOb reduces to 

which implicitly defines the function c(u). First we consider what happens when u is restricted to 
be less than £. 

Theorem 3.7 Assume that oc/m < 7 + a < 1. Le? Q = (1 — a)<5o + ao^ and Q = (1 — a — 7)^0 + 
7<5 U + a5^ with < w < £. Le? C(£) = sup 0<u< ^ c(u) and define £ = ^a/(m(7+a))> 

i. F O r£<£ , 

C(0 = ££0 - £ 2 /2- (25) 

For £ > £ , C(£) is the solution to 

7 $(v / 2^)+a$(^ + f)=-. (26) 

/n this case, C(£) = ^ /(m7) /2 + O(a). 

11 
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Figure 5: The robustness function R(£) for several values of b = minj Wj. In each case, m = 1000, 
a = 0.05, B = 10. Whenever i?(£) > 0, power gain outweighs power loss. When b is near 1, 
R(£) > for most £. Even when b = there is a safe zone including £ = as long as B > 2. 
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2. Let 

^ = z a/m +^z 2 a/m -z 2 , where q= —— . (27) 

For £ < &,, 

inf 7t(m) > 7T B onf- (28) 

0<«<£ 



^or ( > ^ we /zave 

o<«<c" v "' ~~ " \ 2£* 



iu f n ( u ) > $( z "/(™-y) e * | _ 0(a ) (29 ) 



1-$/ W 2 log— J- O(a) (30) 

> l--^--O(a). (31) 

\ — a 

The factor $ I J2 log — - J ~ 7/(1 — a) is the worst case power deficit due to misspecification. 
Now we drop the assumption that u < £. 

Theorem 3.8 Let Q = (1 — a)So + a5^ and let Q u = (1 — a — 7)^0 + 7<5 U + aS^. Let n u denote the 
power at £ using the weights computed under Q u . 

1. The least favorable u is 

u* = argmin u > 7r u = y/2cZ = z a /{ mi ) + 0(a) (32) 

where c* solves 



-<I»(n/2^)+«<I»(| + ^)=- (33) 

2 £ J m 



andc* = z 2 a/{m ^/2 + 0(a). 
2. The minimal power is 



^=Hf -5;=*^ i* l+0(0) ' <34) 

3. A sufficient condition for inf u 7i~ u to Z?e larger than the power of the Bonferroni method is 



£\ I z 2 — £ 2 

C* <^ ^ ^ J 0/(7717) ^ 



e > *«/m + ^/ m -^/ (m7 ) + 0(a). (35) 
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4 Choosing External Weights 

In choosing external weights, we will focus here on the two-valued case. Thus, 

w = ( wi, ■ ■ ■ , wx , w , ...,w ) (36) 

k terms m—k terms 

where A; = em, w% = B/(eB + (l — e)) andw = l/(eB + (l — e)). In practice, we would typically 
have a fixed fraction of hypotheses e that we want to give more weight to. The question is how 
to choose B. We will focus on choosing B to produce weights with good properties at interesting 
values of £. Now large values of £ already have high power. Very small values of £ have extremely 
low power and benefit little by weighting. This leads us to focus on constructing weights that are 
useful for a marginal effect, defined as the alternative £ m that has power 1/2 when given weight 1. 
Thus, the marginal effect is £ m = z a / m . In the rest of this section then we assume that all nonzero 
£/s are equal to £ m . Of course, the validity of the procedure does not depend on this assumption 
being true. 

Fix < e < 1 and vary B. As we increase B, we will eventually reach a point B (e) where 
R(B, e) < which we call turnaround point. Formally, 

B (e) = supJE : R(B, e) > o}. (37) 

The top panel in Figure[6]shows B (e) versus e which shows that for small e we can choose B large 
without loss of power. The bottom panel shows R(B, e) for e = 0.1. We suggest using B = 5*(e), 
the value of B that maximizes R(B, e). 

Theorem 4.1 Fix < e < 1. As a function of B, R(B, e) is unimodal and satisfies R(l, e) = 1, 
R'(l, e) > and R(oo, e) < 0. Hence, B (e) exists and is unique. Also, R(B,e) has a unique 
maximum at some point B*(e) and R(B* (e) , e) > 0. 

When e is very small, we can essentially choose B as large as we like, For example, suppose 
we want to increase the chance of rejecting one particular hypothesis so that e = 1/m. Then, 

rnB 1 

W\ = — ~ B, wq 



B + m — 1 B + m — 1 

and 

lim lim 7r(£j,iUi) = 1, while lim lim tt(^j,wq) = -. 

See Figure [U 

The next results show that binary weighting schemes are optimal in a certain sense. Suppose 
we want to have at least a fraction e with high power 1 — (3 and otherwise we want to maximize 
the minimum power. 
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Figure 6: Top plot: B (e) versus e. Bottom plot shows R(B, .1) versus B. The turnaround point 
B*(e) is shown with a vertical dotted line. 
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Theorem 4.2 Consider the following optimization problem: Given < e < 1 and < (5 < 1/2, 
find a vector w = (wi, . . . , w m ) that maximizes 

min7r(£ m ,Wj) 

subject to 

m 
77?e solution is given by 

w = (wi,...,w 1 , w ,.. .,Wq) (38) 

fc terms m—k terms 

where W\ = B/(eB + (1 — e)), Wo = l/(eB + (1 — e)), k = em, B = cm(l — e)/(a — ecm) and 
c = $ (z Q/m + ^l-^). 

If our goal is to maximize the number of alternatives with high power while maintaining a 
minimum power loss, the solution is given as follows. 

Theorem 4.3 Consider the following optimization problem: Given < j3 < 1/2, find a vector 
w = (wi, . . . , w m ) that maximizes 

#{j: n( Wj ,S m )>l-P} (39) 

subject to 

w — 1, and mmn(wj, £ m ) > 5. (40) 

3 

The solution is 

W = (wi,...,Wi, Wq,.. .,Wq) (41) 

k terms m—k terms 

where 

m— , . m— / x 1 — w , tnx 

w 1 = — $ (z a/m + z^p) , w = — $ (2 Q / m + zs) , e = (42) 

a a w\ — wq 

and k = me. 

A special case that falls under this Theorem permits the minimum power to be 0. In this case 

w = and e = l/u>i. 

5 Estimated Weights 

In this section we explain how to use the data to estimate the weights. There are two issues: 
we must ensure that the error is still controlled and avoid incurring large losses of power due to 
replacing 9 with an estimator 9. 
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5.1 Validity With Estimated Weights 

Data Splitting. The approach, taken by RVD, for ensuring that the error control is preserved relies 
on data splitting. This approach relies on normalized test statistics T^^T^ based on a partition 
of the data into subsets X^, X( 2 ) which include fractions b and (1 — b) of X, respectively. Note 
that Tj = b 1/2 T^ + (1 - b) 1/2 TJ 2 \j = 1, . . . , m. The training data X^ is used to estimate the 
noncentrality parameter of the standardized statistic T- , where E[Tj ] = v&Cj — £J • Testing 
is conducted using the remaining fraction of the data X^ 2 \ Consequently Q must be rescaled by 
r s — (1 — 6)/6) 1 / 2 to estimate the noncentrality parameter of the standardized statistic T- , i.e., 
£j = r s Q ' . The estimated weights are Wj(T^) = p c (£j). Because of the independence between 
the two portions of the data, familywise error is controlled at the nominal level. 

Lemma 5.1 The procedure that rejects when P(Tj ) < w(T^)a/m controls the familywise 
error at level a. 

Recovering Power. As noted by Skol et al. (2005), data splitting incurs a loss of power because 
the p-values are computed using only a fraction the data. To recover this lost power, we need to use 
all the data to compute the p-values. When using this approach Q must be rescaled by r/ = b^ 1 ^ 2 
to estimate the noncentrality parameter of the standardized statistic Tj, i.e., ^ = r/£- . As in 
the data splitting procedure, the estimated weights Wj(T^ x ') = p c (ij) depend only on X^ . To 
preserve error control we proceed as follows. 

Theorem 5.2 Assume T- ~ N(0, 1) independently for k = 1,2. Suppose that weight w(T^) 
depends only on X^ but the p-value P(Tj) is allowed to depend on the full data X Define c(T^) 
to solve 

™f^\ \ ^^b J Jm' (43) 

Then the procedure that rejects when P(Tf) < Wj(T^)a/m, where 

m^l'ij c(TM 



MT®) = ^\± + ^) (44) 

controls the familywise error at level a. 

5.2 Simulations 

We simulate a study with m = 1000 tests, yielding data of the form given in (UJ). A test of the 
hypothesis H : £, 7^ is performed for each j using Tj, which we assume is (approximately) 
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normally distributed, or equivalently T 2 ~ % 2 . In our simulations we generate 50 of the 1000 tests 
under the alternative hypothesis with shift parameter £j = 2, 3, 4 or 5. We compare the power 
for various levels of a threshold parameter A G (0, .5, 1, 1.5, 2, 2.5). We use a fraction b = 0.5 of 
the data to construct the weights and we compare four methods for estimating the noncentrality 
parameter: 

'Til (11 

1 . The normalized statistic Q ' — T- . 



2. Hard thresholding: 

3. Soft thresholding: 

4. The James-Stein estimator 



^ = Tf ) I{\Tf ) \>\). (45) 

g 1) = sign(7; (1) )(|7; (1) -A) + . (46) 



/ 



w 



m — 2 



,(D 



1 T^^l T < " ■ (47) 



Til 
To compute the weights we rescale £j ' by r s or rj as appropriate to the followup testing 

strategy. 

Power results are displayed in Fig. |7J We first consider the power of the RVD procedure 
which uses the data splitting strategy and A = (Fig. [7J labeled "P" at the origin). Although 
RVD suggest using A = 0, we also examine the power of this procedure for a range of values of 
A. This extended RVD procedure is applying hard-thresholding to estimate 9. Next we consider 
the power of four testing strategies that use the full data Tj for testing rather than data splitting. 
The first approach (B) uses binary weights equal to m/M where M = ^ I{\Tj \ > A}. In this 
setting, when A = the method reduces to the simple one-stage Bonferroni approach. For A > 
it is the method of Skol et al. (2006). The remaining three approaches rely on weights estimated 
using hard-thresholding (H), soft-thresholding (S), or James-Stein (J). For A = methods H and S 
reduce to the normed sample mean which is the RVD approach adapted to incorporate the full data 
in the p-value. Clearly, this adaptation of the RVD method leads to a valuable increase in power. 
For A > this is method imposes a hard threshold shrinkage effect on the parameter estimates. 
Notice that for any fixed value of A, method H gives the best power. In particular, the difference 
in power between methods B and H illustrates the advantage of using variable weights estimated 
from a fraction of the data. Method H and to a lesser extent method B are nearly invariant to A for 
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Figure 7: Power of weighted tests. From top left clockwise: £ = 2,3, 4, 5. Methods compared use 
weights based on hard thresholding (H), soft thresholding (S), binary weights (B), and James-Stein 
(J). 

moderate values of the threshold parameter. In contrast, method S, relying on soft-thresholding, 
experiences a sharp decline in power as A increases. Finally, the James-Stein approach clearly fails 
in this setting, presumably because most tests follow the null hypothesis and hence the true signals 
are shrunk toward which diminishes the power of the procedure. 

For each condition investigated the tests had size less than 0.05 as expected from the theory. 
The James-Stein method was most conservative. 

From this experiment it appears that shrinkage only enhances power when the signal is very 
weak. A more careful analysis reveals that the effect of shrinkage for stronger signals is more 
subtle. As £ — > 0, p c (0 ~~ * 0- Figure [8] shows p c (£) is close to zero for a broad range of values. 
Consequently, for A < 1.5, the weight function performs almost the same role as the threshold pa- 
rameter. Using hard-thresholding for A < 1.5 is essentially equivalent to using using no threshold 
because a moderate level of shrinkage is automatically imposed by the weight function. Figure [5] 
also illustrates how the optimal weights vary with the signal strength (top panel has greater signal 
than bottom panel). Both panels indicate that larger weights are placed in the midrange of signal 
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Figure 8: Distribution of weights for two sets of data. 

strength. Essentially no weight is wasted on tests with small signals (£ < 1.5) because these tests 
are not likely to yield significant results. The bottom panel shows that large weights are also not 
wasted on signals so strong that the tests can easily be rejected even without up-weighting (£ > 6). 
The top panel places its largest weights between 2.5 and 4. The bottom panel has fewer signals in 
this range and hence stronger weights can be applied to signals between 2 and 2.5. Both panels 
indicate near weights would be applied to tests with signals near 0. 

6 Discussion 

An interesting connection can be made between weights based on threshold-estimators and two- 
stage experimental designs that perform only a subset of the tests in stage two, based on the results 
obtained from stage one. The simplest example of this type of two-stage testing is the two-stage 
Bonferroni procedure, for which the training data X^ is used to determine the M elements in 
A = {j : \Tj\ > A}; X^ 2 ^ is only measured for these columns. A Bonferroni correction with 
a/(2M) controls FWER at level a for two-sided testing in this setting. In essence this approach is 
a weighted test with weights equal to m/M for the elements in A and zero else where. 
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While the classic two-stage approach uses X^ for training, and X^ for testing, an alternative 
is to use the training data to determine the weights and then use all of the data to conduct the 
tests. This strategy was recently investigated by Skol et al. (2006), using constant weights. These 
authors use the training data to determine A and then apply weights equal to m/M to the M tests 
determined in stage one. This full data approach proved to be considerably more powerful than the 
two-stage Bonferroni approach in simulations. 

For hard and soft-thresholding, ^ = for any |T- ^| < A. From © it follows that the weights 
for any test with ^ = are and the rejection region is Zq = oo. Hence, a procedure using Wj = 
for columns with £, = is equivalent to a truncation procedure that tests only those columns in 
A. In practice, A can be chosen to optimize power or to constrain the experimental budget. It 
is worth noting that in some experimental settings, such as those described by Skol et al., this 
experimental design can lead to considerable savings of effort and resources. Our results suggest 
that this savings can be gleaned without loosing measurable power. 

The same ideas used here can be applied to other testing methods to improve power. In partic- 
ular, weights can be added to the FDR method, Holm's stepdown test, and the Donoho-Jin (2004) 
method. Weighting ideas can also be used for confidence intervals. We plan to present the details 
for the other methods in a followup paper. Another item to be addressed in future work is the 
connection with Bayesian methods. 

As we noted, using weights is equivalent to using a separate rejection cutoff for each statistic. 
The methods of Storey (2005) and Signoravich (2006) find optimal cutoffs when the cutoffs are 
constrained. There is undoubtedly a bias-variance tradeoff. These constrained methods can esti- 
mate optimal cutoffs well (low variance) but they will not achieve the oracle power obtained here 
since they are by design biased away from these separate cutoffs. Future work should be directed 
at comparing these approaches and developing methods that lie in between these extremes. 

7 Appendix 

Proof of Theorem 13- 1 1 Let A denote the set of hypotheses with £j > 0. Power is optimized if 

Wj = for j ^ A. The average power is 



m -^-f V \ m ) ° ) 



m * — : v v m 



with constraint 



jeA 
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Choose w to maximize 



j&A \ j&A 



by setting the derivative to zero 



^— 7r = ~ A + — / — ] ^- = 



mX * (^ (^) " i 



a ■ f~i 



1> (SSm) 

\ m J 

The w that solves these equations is given in ©. Finally, solve for c such that J2i Wi — m. M 

Proof of Lemma 1331 Choose K > 1 such that \/{K + 1) < l/o - e. Choose 1 > 7 > 
(2a - o)/if . Choose a small c> 0. Let £ = A + V-4 2 - 2c and « = 5 - V^ 2 - 2c where 

A = IT 1 ( 1 1 B = &- 1 ( — 1 (48) 

V (m(jK + a)))' \(m(jK + a))J K } 

Then p(£) = l/o and p(£) = 1/(K + 1). Now d(Q, Q) = 7. Taking K sufficiently large and 7 
sufficiently close to (2a — a)/K makes 7 < 5. ■ 



Proof of Theorem 13.51 The first statement follows easily by noting that the worst case corre- 
sponds to choosing weight B in the first term in R(£) and choosing weight b in the second term in 
R(£). The rest follows by Taylor expanding Rb,B(0 around b = l.M 

Proof of Lemmalall With b = 0, R b>B {€) > when 

Hz Ba/m - " 2$(^/m - > 0. (49) 

With B > 2, (|4"91 holds at £ = 0. The left hand side is increasing in £ for £ near but (|4"91 does 
not hold at £ = z a / m . So (|4"91 must holds in the interval [0, £*]. Rewrite (|4"91 as &(z Ba / m — £) — 
^(^a/m— > ^(-Za/m— 0- We lowerbound the left hand side and upper bound the right hand side. 
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Thelefthandsideis$(z Ba/m -0-$(2 a /m-0 = /^™ n _ e 0(w)dw > (z a/m -z Ba/m )(f)(z a/m -^). 
The right hand side can be bounded using Mill's ratio: $(z a / m — £) < 4>{z a / m — £)/(z a / m — £)• 
Set the lower bound greater than the upper bound to obtain the stated result. ■ 



It is convenient to prove Theorem |XU before proving Theorem l3.7l 
Proof of Theorem 13.81 Let c* solve 

7^^) + a® ( | + ^ ) = -. (50) 

We claim first that for any c> c*, there is no u such that the weights average to 1. Fix c> c*. The 
weights average to 1 if and only if 

^(M)+-(HK 

Since c> c, and since the second term is decreasing in c, we must have 

*(£ + |) >»(^). ,52) 



The function r(u) = $(c/w + it/2) is maximized at u — V 2c. So r(v2c) > r(w). But r(V2c) = 
$(v2c). Hence $(\/2c) > r(u) > $(y / 2c^). This implies c < c* which is a contradiction. This 
establishes that sup M c(u) < c*. On the other hand, taking c = c* and w = \J2c* solves equation 
(BTT) . Thus c* is indeed the largest c that solves the equation which establishes the first claim. The 
second claim follows by noting that 

7$( v / 2^) + a$ f | + j\ = 7$(V2^) + 0(a). (53) 

Now set this expression equal to a/m and solve. ■ 

Proof of Theorem 13.71 Define c* as in <|50l>. If u* = -J2c~, < £ then the the proof proceeds 
as in the previous proof. So we first need to establish for which values of £ is this true. Let 
r(c) = 7$(y / 2c) + a$(£/2 + c/£). We want to find out when the solution of r(c) = a/m is 
such that \[2~c < £, or equivalently, c < £ 2 /2. Now r is decreasing in c. Since 7 + a > a/m, 
r(— 00) > a/m. Hence there is a solution with c < £ 2 /2 if and only if r(£ 2 /2) < a/m. But 
r(£ 2 /2) = (7 + a)3>(£) so we conclude that there is such a solution if and only if (7 + a)$(£) < 
a/m, that is, f > 2 Q /( m ( 7+a )) = fo- 
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Now suppose that £ < £ - We need to find u < £ to make c as large as possible in the equation 
v(u, c) = 7$(m/2 + c/u) + a$(£/2 + c/£) = a/m. Let u* = £ and c* = £z a /(m(*r+a)) ~ £ 2 / 2 - 
By direct substitution, v(u*, c*) = a/m for this choice of it and c and clearly w* < £ as required. 
We claim that this is the largest possible c*. To see this, note that v(u, c) < v(u, c*). For £ < £ , 
v (u, c*) is a decreasing function of w. Hence, v(u,c) < v(u,c#) < v («*, c*) = a/m. This 
contradicts the fact that v(u,c) = a/m. 

For the second claim, note that the power of the weighted test beats the power of Bonferroni if 
and only if the weight w = (m/a)$(£/2 + C(£)/2) > 1 which is equivalent to 

C(0 < £* a/m - £ 2 /2. (54) 

When £ < £ , C(0 = ££o — £ 2 /2. By assumption, 7 + a < 1 so that z a /(m(y+a)) < ^a/m and Now 
suppose that £ < £ < £,. Then C(£) is the solution to r(c) = 7$(v / 2c) + a$(£/ 2 + c/£) = a/m. 
We claim that (l54l still holds. Suppose not. Then, since r(c) is decreasing in c, r(£z a / m — £ 2 /2) > 
r(C(£)) = a/m. But, by direct calculation, r(£z a / m — £ 2 /2) > a/m implies that £ > £* which is 
a contradiction. Thus (|2"%1) holds. 

Finally, we turn to (l29t . In this case, C(£) = z 2 a ,, J2 + 0(a). The worst case power is 
$(C(0/£ - f/2) = *(^/(m 7 )/( 2 ~ £/ 2 ) + °( a )- The latter is increasing in £ and so is at least 
^/ (m7 )/(2£*) - e*/2) + 0(a) = $(C£ /(m7) /(2&) - ^ 2 )/(2^)) + 0(a) as claimed. The next 
two equations follow from standard tail approximations for Gaussians. Specifically, a Gaussian 
quantile zp/ m can be written as Zp/ m = a/2 \og(mL m / (3) where L m = clog a (m) for constants a 
and c (Donoho and Jin 2004). Inserting this into the previous expression yields the final expression. 



Proof of Theorem 14.21 Setting ir(w,C, m ) = $(I>~ (wa/m) - £ m ) equal to 1 - (3 implies 
w = (m/a)Q(zi-p + z a /m) which is equal to w 1 as stated in the theorem. The stated form of 
wo implies that the weights average to 1 . The stated solution thus satisfies the restriction that a 
fraction e have power at least 1 — (3. Increasing the weight of any hypothesis whose weight is wq 
necessitates reducing the weight of another hypothesis. This either reduces the minimum power of 
forces a hypothesis with power 1 — @ to fall below 1 — (3. Hence, the stated solution does in fact 
maximize the minimum power. ■ 



The proof of Theorem [43] is similar to the previous proof and is omitted. 
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Proof of Theorem 15.21 The familywise error is 

,(2)\ . w 3 {Tf)a 



F(n nn ) < J2 F [ p 3 ( T i (1) ' T f) - 
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